Goto

Collaborating Authors

 distance threshold


Supplementary for: " GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization "

Neural Information Processing Systems

We organize our supplementary document as follows: 1. Results on additional dataset 2. Results for limited data settings on YFCC26k and GWS15k datasets 3. Additional Ablations (a) Gallery Size (b) Queue Length (c) ση for Batch GPS noise (d) ση for Queue GPS noise (e) σ for Random Fourier Features (f) Number of hierarchies (M) 4. Different selection choices for GPSGallery Construction (a) Evenly Spaced GPSCoordinates (b) Test Set GPSCoordinates 5. Analysis of Runtime and Memory Footprint 6. Motivations for using Pretrained CLIP as Image encoder Backbone 7. Qualitative Demonstration (a) Hierarchical learning in our location encoder L () (b) GeoCLIP with Image Query (c) Distribution of correct predictions of GeoCLIP on different datasets (d) GeoCLIP with Text Query 8. Discussion on Ethical Issues and Possible Mitigation In section 4.1 of the main paper, we demonstrated the performance of our GeoCLIP method on Im2GPS3k [2] and GWS15k [1] datasets and compared them with the state-of-the-art methods. Here, we perform experiments on another dataset YFCC26k [6]. The results are provided in Table 1. This result highlights that GeoCLIP performs well across datasets, being useful across different data distributions. GeoCLIP achieves decent performance across datasets even when the training data is significantly reduced. 2 We show the efficacy of GeoCLIP on limited training samples of Im2GPS3k in section 4.2 of the main paper. Now, we further investigate the performance of GeoCLIP for limited data settings on other datasets (YFCC26k and GWS15k).




Exploring the Potential of Wireless-enabled Multi-Chip AI Accelerators

arXiv.org Artificial Intelligence

The insatiable appetite of Artificial Intelligence (AI) workloads for computing power is pushing the industry to develop faster and more efficient accelerators. The rigidity of custom hardware, however, conflicts with the need for scalable and versatile architectures capable of catering to the needs of the evolving and heterogeneous pool of Machine Learning (ML) models in the literature. In this context, multi-chiplet architectures assembling multiple (perhaps heterogeneous) accelerators are an appealing option that is unfortunately hindered by the still rigid and inefficient chip-to-chip interconnects. In this paper, we explore the potential of wireless technology as a complement to existing wired interconnects in this multi-chiplet approach. Using an evaluation framework from the state-of-the-art, we show that wireless interconnects can lead to speedups of 10% on average and 20% maximum. We also highlight the importance of load balancing between the wired and wireless interconnects, which will be further explored in future work.


Beyond Gradient Averaging in Parallel Optimization: Improved Robustness through Gradient Agreement Filtering

arXiv.org Artificial Intelligence

We introduce Gradient Agreement Filtering (GAF) to improve on gradient averaging in distributed deep learning optimization. Traditional distributed data-parallel stochastic gradient descent involves averaging gradients of microbatches to calculate a macrobatch gradient that is then used to update model parameters. We find that gradients across microbatches are often orthogonal or negatively correlated, especially in late stages of training, which leads to memorization of the training set, reducing generalization. In this paper, we introduce a simple, computationally effective way to reduce gradient variance by computing the cosine distance between micro-gradients during training and filtering out conflicting updates prior to averaging. We improve validation accuracy with significantly smaller microbatch sizes. We also show this reduces memorizing noisy labels. We demonstrate the effectiveness of this technique on standard image classification benchmarks including CIFAR-100 and CIFAR-100N-Fine. We show this technique consistently outperforms validation accuracy, in some cases by up to 18.2\% compared to traditional training approaches while reducing the computation required nearly an order of magnitude because we can now rely on smaller microbatch sizes without destabilizing training.


Development of Image Collection Method Using YOLO and Siamese Network

arXiv.org Artificial Intelligence

As we enter the era of big data, collecting high-quality data is very important. However, collecting data by humans is not only very time-consuming but also expensive. Therefore, many scientists have devised various methods to collect data using computers. Among them, there is a method called web crawling, but the authors found that the crawling method has a problem in that unintended data is collected along with the user. The authors found that this can be filtered using the object recognition model YOLOv10. However, there are cases where data that is not properly filtered remains. Here, image reclassification was performed by additionally utilizing the distance output from the Siamese network, and higher performance was recorded than other classification models. (average \_f1 score YOLO+MobileNet 0.678->YOLO+SiameseNet 0.772)) The user can specify a distance threshold to adjust the balance between data deficiency and noise-robustness. The authors also found that the Siamese network can achieve higher performance with fewer resources because the cropped images are used for object recognition when processing images in the Siamese network. (Class 20 mean-based f1 score, non-crop+Siamese(MobileNetV3-Small) 80.94 -> crop preprocessing+Siamese(MobileNetV3-Small) 82.31) In this way, the image retrieval system that utilizes two consecutive models to reduce errors can save users' time and effort, and build better quality data faster and with fewer resources than before.


BankTweak: Adversarial Attack against Multi-Object Trackers by Manipulating Feature Banks

arXiv.org Artificial Intelligence

Multi-object tracking (MOT) aims to construct moving trajectories for objects, and modern multi-object trackers mainly utilize the tracking-by-detection methodology. Initial approaches to MOT attacks primarily aimed to degrade the detection quality of the frames under attack, thereby reducing accuracy only in those specific frames, highlighting a lack of \textit{efficiency}. To improve efficiency, recent advancements manipulate object positions to cause persistent identity (ID) switches during the association phase, even after the attack ends within a few frames. However, these position-manipulating attacks have inherent limitations, as they can be easily counteracted by adjusting distance-related parameters in the association phase, revealing a lack of \textit{robustness}. In this paper, we present \textsf{BankTweak}, a novel adversarial attack designed for MOT trackers, which features efficiency and robustness. \textsf{BankTweak} focuses on the feature extractor in the association phase and reveals vulnerability in the Hungarian matching method used by feature-based MOT systems. Exploiting the vulnerability, \textsf{BankTweak} induces persistent ID switches (addressing \textit{efficiency}) even after the attack ends by strategically injecting altered features into the feature banks without modifying object positions (addressing \textit{robustness}). To demonstrate the applicability, we apply \textsf{BankTweak} to three multi-object trackers (DeepSORT, StrongSORT, and MOTDT) with one-stage, two-stage, anchor-free, and transformer detectors. Extensive experiments on the MOT17 and MOT20 datasets show that our method substantially surpasses existing attacks, exposing the vulnerability of the tracking-by-detection framework to \textsf{BankTweak}.


GIST: Greedy Independent Set Thresholding for Diverse Data Summarization

arXiv.org Artificial Intelligence

Subset selection is a challenging optimization problem with a wide variety of applications in machine learning, including feature selection, recommender systems, news aggregation, drug discovery, data summarization, and designing pretraining sets for large language models (Anil et al., 2023). Data sampling in particular is a salient problem due to unprecedented and continuous data collection. For example, LiDAR and imaging devices in one self-driving vehicle can easily capture ~80 terabytes of data per day (Kazhamiaka et al., 2021). In most subset selection tasks, we rely on the weight (or utility) of the objects to rank one over the other, and also to avoid selecting duplicate or near-duplicate objects. If we select a small subset, then we also want to ensure that the selected subset is a good representation of the original set. These utility, diversity, and coverage criteria can be expressed through objective functions, and the interesting research lies in developing efficient algorithms with strong approximation guarantees. The underlying machinery used in constrained subset selection algorithms shares many similarities with techniques from other areas of combinatorial optimization such as submodular maximization, -center clustering, and convex hull approximations. In this work, we study the problem of selecting a set of points in a metric space that maximizes an objective that combines their utility and a minimum pairwise-distance diversity measure.


A GRASP-based memetic algorithm with path relinking for the far from most string problem

arXiv.org Artificial Intelligence

Such problems have attracted a lot of interest for multiple reasons. From a theoretical (and even from a purely algorithmic) point of view, they constitute a clear and well-defined domain in which computational complexity issues can be analyzed and search/optimization algorithms can be put to work in challenging conditions. From a more practical point of view, there are many real-world problems which can be formalized as SSPs. Such problems are notably found in the area of computational biology, in which technological advances and the numerous initiatives are producing an unprecedented flood of data (Reichhardt, 1999) very much requiring the use of powerful computational tools to overcome the associated challenges (Meneses et al., 2005). Among such problems of interest from the perspective of SSPs we can cite discovering potential drug targets, creating diagnostic probes, designing primers, locating binding sites, or identifying consensus sequences just to name a few (Festa, 2007; Lanctot et al., 2003; Meneses et al., 2005).


Multi-agent statistical discriminative sub-trajectory mining and an application to NBA basketball

arXiv.org Artificial Intelligence

Improvements in tracking technology through optical and computer vision systems have enabled a greater understanding of the movement-based behaviour of multiple agents, including in team sports. In this study, a Multi-Agent Statistically Discriminative Sub-Trajectory Mining (MA-Stat-DSM) method is proposed that takes a set of binary-labelled agent trajectory matrices as input and incorporates Hausdorff distance to identify sub-matrices that statistically significantly discriminate between the two groups of labelled trajectory matrices. Utilizing 2015/16 SportVU NBA tracking data, agent trajectory matrices representing attacks consisting of the trajectories of five agents (the ball, shooter, last passer, shooter defender, and last passer defender), were truncated to correspond to the time interval following the receipt of the ball by the last passer, and labelled as effective or ineffective based on a definition of attack effectiveness that we devise in the current study. After identifying appropriate parameters for MA-Stat-DSM by iteratively applying it to all matches involving the two top- and two bottom-placed teams from the 2015/16 NBA season, the method was then applied to selected matches and could identify and visualize the portions of plays, e.g., involving passing, on-, and/or off-the-ball movements, which were most relevant in rendering attacks effective or ineffective.